改善疾病的护理标准是关于更好的治疗方法,反过来依赖于寻找和开发新药。然而,药物发现是一个复杂且昂贵的过程。通过机器学习的方法采用了利用域固有的互连性质的药物发现知识图的创建。基于图形的数据建模,结合知识图形嵌入式提供了更直观的域表示,适用于推理任务,例如预测缺失链路。一个这样的例子将产生对给定疾病的可能相关基因的排名列表,通常被称为目标发现。因此,这是关键的,即这些预测不仅是相关的,而且是生物学上的有意义的。然而,知识图形可以直接偏向,由于集成的底层数据源,或者由于图形构造中的建模选择,其中的一个结果是某些实体可以在拓扑上超越。我们展示了知识图形嵌入模型可能受到这种结构不平衡的影响,导致无论上下文都要高度排名的密集连接实体。我们在不同的数据集,模型和预测任务中提供对此观察的支持。此外,我们展示了如何通过随机,生物学上无意义的信息扰乱图形拓扑结构以人为地改变基因的等级。这表明这种模型可能会受到实体频率而不是在关系中编码的生物学信息的影响,当实体频率不是基础数据的真实反射时,创建问题。我们的结果突出了数据建模选择的重要性,并强调了从业者在解释模型输出和知识图形组合期间时要注意这些问题。
translated by 谷歌翻译
该药物发现​​和开发过程是一个漫长而昂贵的过程,每次药物平均耗资超过10亿美元,需要10 - 15年的时间。为了减少在整个过程中的高水平流失量,在最近十年中,越来越多地将机器学习方法应用于药物发现和发育的各个阶段,尤其是在最早鉴定可药物疾病基因的阶段。在本文中,我们开发了一种新的张量分解模型,以预测用于治疗疾病的潜在药物靶标(基因或蛋白质)。我们创建了一个三维数据张量,该数据张量由1,048个基因靶标,860个疾病和230,0111111111111111111111111111111的证据属性和临床结果,并使用从开放式目标和药物数据库中提取的数据组成。我们用从药物发现的知识图中学到的基因目标表示丰富了数据,并应用了我们提出的方法来预测看不见的基因靶标和疾病对的临床结果。我们设计了三种评估策略来衡量预测性能,并将几个常用的机器学习分类器与贝叶斯矩阵和张量分解方法进行了基准测试。结果表明,合并知识图嵌入可显着提高预测准确性,并与密集的神经网络一起训练张量分解优于所有其他基线。总而言之,我们的框架结合了两种积极研究的机器学习方法,用于疾病目标识别,即张量分解和知识图表示学习,这可能是在数据驱动的药物发现中进一步探索的有希望的途径。
translated by 谷歌翻译
药物发现和发展是一个复杂和昂贵的过程。正在研究机器学习方法,以帮助提高药物发现管道多个阶段的有效性和速度。其中,使用知识图表(kg)的那些在许多任务中具有承诺,包括药物修复,药物毒性预测和靶基因疾病优先级。在药物发现kg中,包括基因,疾病和药物在内的关键因素被认为是实体,而它们之间的关系表示相互作用。但是,为了构建高质量的KG,需要合适的数据。在这篇综述中,我们详细介绍了适用于构建聚焦KGS的药物发现的公开使用来源。我们的目标是帮助引导机器学习和kg从业者对吸毒者发现领域应用新技术,但是谁可能不熟悉相关的数据来源。通过严格的标准选择数据集,根据包含内部包含的主要信息类型,并基于可以提取的信息来进行分类以构建kg。然后,我们对现有的公共药物发现KGS进行了比较分析,并评估了文献中所选择的激励案例研究。此外,我们还提出了众多和与域及其数据集相关的众多挑战和问题,同时突出了关键的未来研究方向。我们希望本综述将激励KGS在药物发现领域的关键和新兴问题中使用。
translated by 谷歌翻译
Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.
translated by 谷歌翻译
Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
Remote sensing of the Earth's surface water is critical in a wide range of environmental studies, from evaluating the societal impacts of seasonal droughts and floods to the large-scale implications of climate change. Consequently, a large literature exists on the classification of water from satellite imagery. Yet, previous methods have been limited by 1) the spatial resolution of public satellite imagery, 2) classification schemes that operate at the pixel level, and 3) the need for multiple spectral bands. We advance the state-of-the-art by 1) using commercial imagery with panchromatic and multispectral resolutions of 30 cm and 1.2 m, respectively, 2) developing multiple fully convolutional neural networks (FCN) that can learn the morphological features of water bodies in addition to their spectral properties, and 3) FCN that can classify water even from panchromatic imagery. This study focuses on rivers in the Arctic, using images from the Quickbird, WorldView, and GeoEye satellites. Because no training data are available at such high resolutions, we construct those manually. First, we use the RGB, and NIR bands of the 8-band multispectral sensors. Those trained models all achieve excellent precision and recall over 90% on validation data, aided by on-the-fly preprocessing of the training data specific to satellite imagery. In a novel approach, we then use results from the multispectral model to generate training data for FCN that only require panchromatic imagery, of which considerably more is available. Despite the smaller feature space, these models still achieve a precision and recall of over 85%. We provide our open-source codes and trained model parameters to the remote sensing community, which paves the way to a wide range of environmental hydrology applications at vastly superior accuracies and 2 orders of magnitude higher spatial resolution than previously possible.
translated by 谷歌翻译
Many real-world reinforcement learning tasks require control of complex dynamical systems that involve both costly data acquisition processes and large state spaces. In cases where the transition dynamics can be readily evaluated at specified states (e.g., via a simulator), agents can operate in what is often referred to as planning with a \emph{generative model}. We propose the AE-LSVI algorithm for best-policy identification, a novel variant of the kernelized least-squares value iteration (LSVI) algorithm that combines optimism with pessimism for active exploration (AE). AE-LSVI provably identifies a near-optimal policy \emph{uniformly} over an entire state space and achieves polynomial sample complexity guarantees that are independent of the number of states. When specialized to the recently introduced offline contextual Bayesian optimization setting, our algorithm achieves improved sample complexity bounds. Experimentally, we demonstrate that AE-LSVI outperforms other RL algorithms in a variety of environments when robustness to the initial state is required.
translated by 谷歌翻译
Natural language interaction is a promising direction for democratizing 3D shape design. However, existing methods for text-driven 3D shape editing face challenges in producing decoupled, local edits to 3D shapes. We address this problem by learning disentangled latent representations that ground language in 3D geometry. To this end, we propose a complementary tool set including a novel network architecture, a disentanglement loss, and a new editing procedure. Additionally, to measure edit locality, we define a new metric that we call part-wise edit precision. We show that our method outperforms existing SOTA methods by 20% in terms of edit locality, and up to 6.6% in terms of language reference resolution accuracy. Our work suggests that by solely disentangling language representations, downstream 3D shape editing can become more local to relevant parts, even if the model was never given explicit part-based supervision.
translated by 谷歌翻译
Neural networks have revolutionized the area of artificial intelligence and introduced transformative applications to almost every scientific field and industry. However, this success comes at a great price; the energy requirements for training advanced models are unsustainable. One promising way to address this pressing issue is by developing low-energy neuromorphic hardware that directly supports the algorithm's requirements. The intrinsic non-volatility, non-linearity, and memory of spintronic devices make them appealing candidates for neuromorphic devices. Here we focus on the reservoir computing paradigm, a recurrent network with a simple training algorithm suitable for computation with spintronic devices since they can provide the properties of non-linearity and memory. We review technologies and methods for developing neuromorphic spintronic devices and conclude with critical open issues to address before such devices become widely used.
translated by 谷歌翻译
We present a method for controlling a swarm using its spectral decomposition -- that is, by describing the set of trajectories of a swarm in terms of a spatial distribution throughout the operational domain -- guaranteeing scale invariance with respect to the number of agents both for computation and for the operator tasked with controlling the swarm. We use ergodic control, decentralized across the network, for implementation. In the DARPA OFFSET program field setting, we test this interface design for the operator using the STOMP interface -- the same interface used by Raytheon BBN throughout the duration of the OFFSET program. In these tests, we demonstrate that our approach is scale-invariant -- the user specification does not depend on the number of agents; it is persistent -- the specification remains active until the user specifies a new command; and it is real-time -- the user can interact with and interrupt the swarm at any time. Moreover, we show that the spectral/ergodic specification of swarm behavior degrades gracefully as the number of agents goes down, enabling the operator to maintain the same approach as agents become disabled or are added to the network. We demonstrate the scale-invariance and dynamic response of our system in a field relevant simulator on a variety of tactical scenarios with up to 50 agents. We also demonstrate the dynamic response of our system in the field with a smaller team of agents. Lastly, we make the code for our system available.
translated by 谷歌翻译